Project: Investigate a Dataset

Investigation of the Soccer Dataset

Table of Contents

Introduction

This investigation uses the football dataset provided through this link: (https://www.kaggle.com/datasets/hugomathien/soccer). The perspective taken was that of analysing the creative attributes of the teams that win championships. The English Premier League was selected as the focus with the period of 2009 to 2015.

The pandas documentation was utilized for reference (https://pandas.pydata.org/docs/)

The plotly documentation was utilized for plotting reference (https://plotly.com/python/radar-chart/)

Research Question 1: Are premier league winning teams characterized primarily by creativity, team cohesion and fluidity?

Research Question 2: Based on the result of question 1, can we then claim that seasons where the champions were not at the top of the creativity charts as being more competitive?

Data Wrangling

Tip: In this section of the report, you will load in the data, check for cleanliness, and then trim and clean your dataset for analysis. Make sure that you document your steps carefully and justify your cleaning decisions.

General Properties

Data Cleaning

Check for duplicates within the dataset

Check for columns with null values

Replace the columns with null values

Since the dataset is primarily numerict, we can replace the null values with zero for a more seamless set of data types

We will only replace the data for the columns we picked to answer our questions. Later we will drop the columns that aren't used

Confirm the replacement worked on the relevant columns

Finally we drop the unnecessary columns from our dataset to better focus on our relevant data

Confirmation of the columns present in the new dataset

Save the new dataset to a csv file that we will use from this point forward

Rename the columns

team_long_name refers to the football clubs full name

team_fifa_api_id_x refers to the identifier given by FIFA to the football club for reference

team_api_id refers to the identifier for each football club for reference

date this is the record of the date i.e. the season when the data was recorded

buildUpPlaySpeed refers to the speed/rate a team takes link up play leading to the creation of a goal scoring opportunity (this figure would most likely be an average of all the matches within a season)

chanceCreationPassing refers to the creation of passing chances per season (this figure would most likely be an average of all the matches within a season)

chanceCreationCrossing refers to the creation of crossing chances per season (this figure would most likely be an average of all the matches within a season)

chanceCreationShooting refers to the creation of shooting chances per season (this figure would most likely be an average of all the matches within a season)

Exploratory Data Analysis

Research Question 1: Are premier league winning teams characterized primarily by creativity, team cohesion and fluidity?

A comparitive analysis on the EPL Champions between 2009 and 2015
Team Attributes being compared:
  1. Build up play speed
  2. Chance creation (passing)
  3. Chance creation (crossing)
  4. Chance creation (shooting)

Line graph plots for the data

Based on the results from the chart of build up play speed, we may conclude that our initial question does not always fit as in the 2011, 2012 and 2014, the EPL champions were not as direct in their build up play compared to the other two teams that were frequent winners within the period of 2010 to 2015

For passing chance creation the teams that won the EPL in the 2011, 2012, 2014 and 2015 season weren't at the top of the passing chance creation charts. The seasons 2011, 2012 and 2014 are present once more.

Cross chance creation starts of at a high rate before dropping off for all teams as the seasons progress. In this case the seasons are 2012 and 2014 common in the previous visualizations where the champion doesn't lead the charts.

For the seasons 2011, 2012, 2013 and 2014 the champion does not produce the most shot creation chances.

Histogram Plots for the data

Based on the results from the chart of build up play speed, we may conclude that our initial question does not always fit as in the 2011, 2012 and 2014, the EPL champions were not as direct in their build up play compared to the other two teams that were frequent winners within the period of 2010 to 2015

For passing chance creation the teams that won the EPL in the 2011, 2012, 2014 and 2015 season weren't at the top of the passing chance creation charts. The seasons 2011, 2012 and 2014 are present once more.

Cross chance creation starts of at a high rate before dropping off for all teams as the seasons progress. In this case the seasons are 2012 and 2014 common in the previous visualizations where the champion doesn't lead the charts.

For the seasons 2011, 2012, 2013 and 2014 the champion does not produce the most shot creation chances.

The radar chart above represents the progression over the period of 2010 to 2015 for build up play speed per club. Hovering above the coloured sections will relay the club.

The radar chart above represents the progression over the period of 2010 to 2015 for chance creation for passing per club. Hovering above the coloured sections will relay the club.

The radar chart above represents the progression over the period of 2010 to 2015 for chance creation for crossing per club. Hovering above the coloured sections will relay the club.

The radar chart above represents the progression over the period of 2010 to 2015 for chance creation for shooting per club. Hovering above the coloured sections will relay the club.

Conclusions

Within all the visualizations the seasons most present are: 2012 - 4 occurrences 2014 - 4 occurrences

In the 2012 season the Champion Manchester City only won by goal difference while in the 2014 season the Champion Manchester City once again won narrowly by a 2 point gap. Thus in most cases between 2010 and 2015, the EPL champion is not the leader in either of the team attributes studied, with the seasons 2012 and 2014 showing greater competition towards the winning of the title.

The limitation to this exploration is that an exact correlation to a teams creative dominance to being a champion isn't a clear, thus leading this exploration to be more speculatory. Since the exploration is focussed on the eventual champions teams that are known to have creative football philosophies aren't represented within this study.